In this example we will examine multivariate forecasting models using mvgam. Here we will access monthly search volume data from Google Trends, focusing on relative importances of search terms related to tick paralysis in Queensland, Australia
library(mvgam)
library(tidyr)
if (!require(gtrendsR)) {
install.packages("gtrendsR")
}
terms = c("tick bite", "tick paralysis",
"dog tick", "paralysis tick dog")
trends <- gtrendsR::gtrends(terms, geo = "AU-QLD",
time = "all", onlyInterest = T)
Google Trends modified their algorithm for extracting search volume data in 2012, so we filter the series to only include observations after this point in time
gtest <- trends$interest_over_time %>% tidyr::spread(keyword,
hits) %>% dplyr::select(-geo, -time,
-gprop, -category) %>% dplyr::mutate(date = lubridate::ymd(date)) %>%
dplyr::mutate(year = lubridate::year(date)) %>%
dplyr::filter(year > 2012) %>% dplyr::select(-year)
Convert to an xts object and then to the required mvgam format, holding out the final 10% of observations as the test data
series <- xts::xts(x = gtest[, -1], order.by = gtest$date)
trends_data <- series_to_mvgam(series, freq = 12,
train_prop = 0.9)
Plot the series to see how similar their seasonal shapes are over time
plot(series, legend.loc = "topleft")
Now we will fit an mvgam model with shared seasonality and random intercepts per series. An advantage of using JAGS for sampling is that we can implement truncated likelihood functions to recognise known bounds for particular parameters. As we know that Google trends relative search volumes cannot go above 100, we specify this as the upper bound for each series (note that these truncated likelihoods are much slower to estimate in JAGS so only use them if required). Here we assume the dynamic trends can be represented using three latent factors that each follow an AR3 process, and we assume a Negative Binomial distrubution for the response. At present each series is presumed to show the same degree of overdispersion, but this assumption will be relaxed in future versions
trends_mod <- mvjagam(data_train = trends_data$data_train,
data_test = trends_data$data_test, formula = y ~
s(series, bs = "re") + s(season,
k = 12, m = 2, bs = "cc"), knots = list(season = c(0.5,
12.5)), use_lv = T, trend_model = "AR3",
n_lv = 3, family = "nb", n.burnin = 2000,
upper_bounds = rep(100, length(terms)),
auto_update = F)
## Fitting a multivariate GAM with latent dynamic factors for the trends...
## module glm loaded
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 392
## Unobserved stochastic nodes: 842
## Total graph size: 13246
##
## Initializing model
Have a look at the returned JAGS model file to see how the dynamic factors are incorporated. Notice that the precisions of factors, together with each factor’s set of loadings, are penalised using a regularised horseshoe prior. This should hopefully protect against specifying too large a number of factors than is needed to represent dependent temporal dynamics
trends_mod$model_file
## [1] "model {"
## [2] ""
## [3] " ## GAM linear predictor"
## [4] " eta <- X %*% b"
## [5] ""
## [6] " ## Mean expectations"
## [7] " for (i in 1:n) {"
## [8] " for (s in 1:n_series) {"
## [9] " mu[i, s] <- exp(eta[ytimes[i, s]] + trend[i, s])"
## [10] " }"
## [11] " }"
## [12] ""
## [13] " ## Latent factors evolve as time series with shared precision;"
## [14] " ## the penalty terms are from the regularized horseshoe (below), which"
## [15] " ## act here to force any un-needed factors to evolve as flat lines"
## [16] " tau_fac ~ dnorm(0, 0.1)T(0,)"
## [17] " for(j in 1:n_lv){"
## [18] " LV[1, j] ~ dnorm(0, tau_fac* penalty[j])"
## [19] " }"
## [20] ""
## [21] " for(j in 1:n_lv){"
## [22] " LV[2, j] ~ dnorm(phi[j] + ar1[j]*LV[1, j], tau_fac* penalty[j])"
## [23] " }"
## [24] ""
## [25] " for(j in 1:n_lv){"
## [26] " LV[3, j] ~ dnorm(phi[j] + ar1[j]*LV[2, j] + ar2[j]*LV[1, j], tau_fac* penalty[j])"
## [27] " }"
## [28] ""
## [29] " for(i in 4:n){"
## [30] " for(j in 1:n_lv){"
## [31] " LV[i, j] ~ dnorm(phi[j] + ar1[j]*LV[i - 1, j] +"
## [32] " ar2[j]*LV[i - 2, j] + ar3[j]*LV[i - 3, j], tau_fac* penalty[j])"
## [33] " }"
## [34] " }"
## [35] ""
## [36] " ## AR components"
## [37] " for (s in 1:n_lv){"
## [38] " phi[s] ~ dnorm(0, 10)"
## [39] " ar1[s] ~ dnorm(0, 10)"
## [40] " ar2[s] ~ dnorm(0, 10)"
## [41] " ar3[s] ~ dnorm(0, 10)"
## [42] " }"
## [43] ""
## [44] " ## Latent factor loadings are penalized using a regularized horseshoe prior"
## [45] " ## to allow loadings for entire factors to be 'dropped', reducing overfitting. Still"
## [46] " ## need to impose identifiability constraints"
## [47] ""
## [48] " ## Global shrinkage penalty (half cauchy, following Gelman et"
## [49] " ## al. 2008: A weakly informative default prior distribution for logistic and other"
## [50] " ## regression models)"
## [51] " lv_tau ~ dscaled.gamma(0.5, 5)"
## [52] ""
## [53] " ## Shrinkage penalties for each factor, which are also half cuachy to allow"
## [54] " ## the entire factor's set of coefficients to be squeezed toward zero if supported by"
## [55] " ## the data. The prior for individual factor penalties allows each factor to possibly"
## [56] " ## have a relatively large penalty, which shrinks the prior for that factor's loadings"
## [57] " ## substantially"
## [58] " for (j in 1:n_lv){"
## [59] " penalty[j] ~ dscaled.gamma(0.5, 0.5)"
## [60] " }"
## [61] ""
## [62] " ## Upper triangle of loading matrix set to zero"
## [63] " for(j in 1:(n_lv - 1)){"
## [64] " for(j2 in (j + 1):n_lv){"
## [65] " lv_coefs[j, j2] <- 0"
## [66] " }"
## [67] " }"
## [68] ""
## [69] " ## Positive constraints on loading diagonals"
## [70] " for(j in 1:n_lv) {"
## [71] " lv_coefs[j, j] ~ dnorm(0, lv_tau * penalty[j])T(0, 1);"
## [72] " }"
## [73] ""
## [74] " ## Lower diagonal free"
## [75] " for(j in 2:n_lv){"
## [76] " for(j2 in 1:(j - 1)){"
## [77] " lv_coefs[j, j2] ~ dnorm(0, lv_tau * penalty[j2])T(-1, 1);"
## [78] " }"
## [79] " }"
## [80] ""
## [81] " ## Other elements also free"
## [82] " for(j in (n_lv + 1):n_series) {"
## [83] " for(j2 in 1:n_lv){"
## [84] " lv_coefs[j, j2] ~ dnorm(0, lv_tau * penalty[j2])T(-1, 1);"
## [85] " }"
## [86] " }"
## [87] ""
## [88] " ## Trend evolution for the series depends on latent factors"
## [89] " for (i in 1:n){"
## [90] " for (s in 1:n_series){"
## [91] " trend[i, s] <- inprod(lv_coefs[s,], LV[i,])"
## [92] " }"
## [93] " }"
## [94] ""
## [95] " ## Negative binomial likelihood functions"
## [96] " for (i in 1:n) {"
## [97] " for (s in 1:n_series) {"
## [98] " y[i, s] ~ dnegbin(rate[i, s], r)T(, upper_bound[s]);"
## [99] " rate[i, s] <- ifelse((r / (r + mu[i, s])) < min_eps, min_eps,"
## [100] " (r / (r + mu[i, s])))"
## [101] " }"
## [102] " }"
## [103] " r ~ dexp(0.001)"
## [104] ""
## [105] " ## Posterior predictions"
## [106] " for (i in 1:n) {"
## [107] " for (s in 1:n_series) {"
## [108] " ypred[i, s] ~ dnegbin(rate[i, s], r)T(, upper_bound[s])"
## [109] " }"
## [110] " }"
## [111] " "
## [112] " ## Parametric effect priors CHECK tau=1/24^2 is appropriate!"
## [113] "for (i in 1:1) { b[i] ~ dnorm(0, 0.25) }"
## [114] " ## prior for s(series)... "
## [115] " for (i in c(2:5)) { b[i] ~ dnorm(0, lambda[1]) }"
## [116] " ## prior for s(season)... "
## [117] " K2 <- S2[1:10,1:10] * lambda[2] "
## [118] " b[6:15] ~ dmnorm(zero[6:15],K2) "
## [119] " ## smoothing parameter priors CHECK..."
## [120] " for (i in 1:2) {"
## [121] " lambda[i] ~ dgamma(.05,.005)"
## [122] " rho[i] <- log(lambda[i])"
## [123] " }"
## [124] "}"
Given that these series could potentially be following a hierarchical seasonality, we will also trial a slghtly more complex model with an extra smoothing term per series that allows its seasonal curve to deviate from the global seasonal smooth
trends_mod2 <- mvjagam(data_train = trends_data$data_train,
data_test = trends_data$data_test, formula = y ~
s(series, bs = "re") + s(season,
k = 12, m = 2, bs = "cc") + s(season,
by = series, k = 5, m = 1), knots = list(season = c(0.5,
12.5)), use_lv = T, trend_model = "AR3",
n_lv = 3, family = "nb", n.burnin = 2000,
upper_bounds = rep(100, length(terms)),
auto_update = F)
## Fitting a multivariate GAM with latent dynamic factors for the trends...
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 392
## Unobserved stochastic nodes: 850
## Total graph size: 20326
##
## Initializing model
Compare the models using rolling forecast DRPS evaluation. Here we focus on near-term forecasts (fc_horizon = 3) when comparing model performances
compare_mvgams(trends_mod, trends_mod2, fc_horizon = 3,
n_cores = 3, n_evaluations = 10)
## DRPS summaries per model (lower is better)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## Model 1 2.628033 6.099006 7.919379 9.384568 11.44073 20.86273
## Model 2 2.419889 6.064620 8.045416 9.629487 13.38588 24.88980
##
## 90% interval coverages per model (closer to 0.9 is better)
## Model 1 0.9166667
## Model 2 0.9083333
Model 1 (with shared seasonality) is slightly preferred, suggesting there is not sufficient evidence that the variation in the seasonal curves for these series may not be useful for forecasting. This is likely because the noisiness of these series makes it difficult to estimate their individual seasonanl patterns, while any fluctuations through time can be more readily captured by the flexibility of the latent trends.
Inspect the model summary (note again that p-value approximations are a work in progress here and so may not be totally reliable).
summary_mvgam(trends_mod)
## GAM formula:
## y ~ s(series, bs = "re") + s(season, k = 12, m = 2, bs = "cc")
##
## Family:
## Negative Binomial
##
## Number of latent factors:
## 3
##
## N series:
## 4
##
## N observations per series:
## 98
##
## Dispersion parameter estimate:
## 2.5% 50% 97.5% Rhat n.eff
## r 23.73372 646.6896 3907.138 1.02 480
##
## GAM smooth term approximate significances:
## edf Ref.df Chi.sq p-value
## s(series) 2.992 4.000 11.45 0.00352 **
## s(season) 7.228 10.000 86.21 7.09e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## (Intercept) 1.408466145 1.75132280 2.11214961 1.10 16
## s(series).1 0.180222654 0.56468903 1.00977644 1.00 20
## s(series).2 -1.591024633 -1.10185230 -0.51938302 1.01 32
## s(series).3 -0.726072907 -0.10691447 0.43444527 1.01 22
## s(series).4 0.206968820 0.71697213 1.07196570 1.14 26
## s(season).1 -0.417486897 -0.27591651 -0.10083377 1.34 58
## s(season).2 -0.757532450 -0.53189649 -0.34320433 1.01 43
## s(season).3 -0.627605159 -0.37693680 -0.16617397 1.32 40
## s(season).4 -0.308590469 -0.13658301 0.04997107 1.07 53
## s(season).5 -0.360693264 -0.14820142 0.01124474 1.29 50
## s(season).6 -0.097957642 0.05938489 0.21987695 1.15 68
## s(season).7 0.193508205 0.34703800 0.50537433 1.05 53
## s(season).8 0.560648443 0.74084098 0.92295080 1.04 42
## s(season).9 0.321757184 0.50051777 0.72134568 1.08 37
## s(season).10 0.008704949 0.12653659 0.25954870 1.01 86
##
## GAM smoothing parameter (rho) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## s(series) -1.368971 0.417526 1.720728 1.00 1046
## s(season) 3.728252 5.339130 6.443365 1.15 109
##
## Latent trend drift (phi) and AR parameter estimates:
## 2.5% 50% 97.5% Rhat n.eff
## phi[1] -0.51613904 0.013596841 0.4975300 1.11 50
## phi[2] -0.72973348 -0.183004242 0.3002672 1.01 43
## phi[3] -0.31169051 0.204076486 0.7316597 1.01 47
## ar1[1] -0.30281572 -0.009181438 0.3150056 1.01 668
## ar1[2] -0.04454036 0.229564674 0.5483573 1.01 444
## ar1[3] -0.35933856 -0.037721482 0.2950201 1.02 487
## ar2[1] -0.44329863 -0.105552687 0.1979554 1.01 423
## ar2[2] -0.22269660 0.097541431 0.4012076 1.00 559
## ar2[3] -0.29621850 -0.020954695 0.2492425 1.00 591
## ar3[1] -0.17821111 0.132515729 0.4434303 1.01 714
## ar3[2] -0.36630830 -0.099871109 0.1932555 1.00 390
## ar3[3] -0.22250281 0.101582421 0.3870088 1.03 337
##
Look at Dunn-Smyth residuals for some series from this preferred model to ensure the Negative Binomial is appropriate and that our dynamic factor process has captured most of the temporal dependencies in the observations
plot_mvgam_resids(trends_mod, 1)
plot_mvgam_resids(trends_mod, 2)
plot_mvgam_resids(trends_mod, 3)
plot_mvgam_resids(trends_mod, 4)
Perform posterior predictive checks to see if the model is able to simulate data that looks realistic and unbiased by examining simulated kernel densities for posterior predictions (yhat) compared to the density of the observations (y). This will be particularly useful for examining whether the shared overdispersion parameter is able to produce realistic looking simulations for each individual series.
plot_mvgam_ppc(trends_mod, series = 1, type = "density")
plot_mvgam_ppc(trends_mod, series = 2, type = "density")
plot_mvgam_ppc(trends_mod, series = 3, type = "density")
plot_mvgam_ppc(trends_mod, series = 4, type = "density")
Look at traceplots for the smoothing parameters (rho) and latent trend parameters
plot_mvgam_trace(object = trends_mod, param = "rho")
plot_mvgam_trace(object = trends_mod, param = "trend")
Plot posterior predictive distributions for the training and testing periods for each series
plot_mvgam_fc(object = trends_mod, series = 1,
data_test = trends_data$data_test)
plot_mvgam_fc(object = trends_mod, series = 2,
data_test = trends_data$data_test)
plot_mvgam_fc(object = trends_mod, series = 3,
data_test = trends_data$data_test)
plot_mvgam_fc(object = trends_mod, series = 4,
data_test = trends_data$data_test)
Plot posterior distributions for the latent trend estimates, again for the training and testing periods
plot_mvgam_trend(object = trends_mod, series = 1,
data_test = trends_data$data_test)
plot_mvgam_trend(object = trends_mod, series = 2,
data_test = trends_data$data_test)
plot_mvgam_trend(object = trends_mod, series = 3,
data_test = trends_data$data_test)
plot_mvgam_trend(object = trends_mod, series = 4,
data_test = trends_data$data_test)
Given that we fit a model with a shared seasonal pattern, the seasonal smooths for each series will be identical. Plot it here
plot_mvgam_smooth(object = trends_mod, series = 1,
smooth = "season")
Plot posterior mean estimates of latent trend correlations. These correlations are more useful than looking at latent factor loadings, for example to inspect ordinations. This is because the orders of the loadings (although constrained for identifiability purposes) can vary from chain to chain
correlations <- lv_correlations(object = trends_mod)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.2
mean_correlations <- correlations$mean_correlations
mean_correlations[upper.tri(mean_correlations)] <- NA
mean_correlations <- data.frame(mean_correlations)
ggplot(mean_correlations %>% tibble::rownames_to_column("series1") %>%
tidyr::pivot_longer(-c(series1), names_to = "series2",
values_to = "Correlation"), aes(x = series1,
y = series2)) + geom_tile(aes(fill = Correlation)) +
scale_fill_gradient2(low = "darkred",
mid = "white", high = "darkblue",
midpoint = 0, breaks = seq(-1, 1,
length.out = 5), limits = c(-1,
1), name = "Trend\ncorrelation") +
labs(x = "", y = "") + theme_dark() +
theme(axis.text.x = element_text(angle = 45,
hjust = 1))
There is certainly some evidence of positive trend correlations for a few of these search terms, which is not surprising given how similar some of them are and how closely linked they should be to interest about tick paralysis in Queensland. Plot some STL decompositions of these series to see if these trends are noticeable in the data
plot(stl(ts(as.vector(series$`tick paralysis`),
frequency = 12), "periodic"))
plot(stl(ts(as.vector(series$`paralysis tick dog`),
frequency = 12), "periodic"))
plot(stl(ts(as.vector(series$`dog tick`),
frequency = 12), "periodic"))
plot(stl(ts(as.vector(series$`tick bite`),
frequency = 12), "periodic"))
A logical next step for this analysis would be to trial varying overdispersion parameters for each series. This may be necessary as the residual diagnostic plots and forecast period posterior predictive checks suggest that the estimated Negative Binomial overdispersion parameter is more suitable for some series than for others:
plot_mvgam_ppc(trends_mod, series = 1, type = "density",
data_test = trends_data$data_test)
plot_mvgam_ppc(trends_mod, series = 1, type = "mean",
data_test = trends_data$data_test)
plot_mvgam_ppc(trends_mod, series = 2, type = "density",
data_test = trends_data$data_test)
plot_mvgam_ppc(trends_mod, series = 2, type = "mean",
data_test = trends_data$data_test)
plot_mvgam_ppc(trends_mod, series = 3, type = "density",
data_test = trends_data$data_test)
plot_mvgam_ppc(trends_mod, series = 3, type = "mean",
data_test = trends_data$data_test)
plot_mvgam_ppc(trends_mod, series = 4, type = "density",
data_test = trends_data$data_test)
plot_mvgam_ppc(trends_mod, series = 4, type = "mean",
data_test = trends_data$data_test)
Other next steps could involve devising a more goal-specific set of posterior predictive checks (see this paper by Gelman et al and relevant works by Betancourt for examples), compare out of sample Discrete Rank Probability Scores for this model and simpler versions for the latent trends (i.e. AR2, AR1, Random Walk)